10 - 2
[1] 8
3 * 4
[1] 12
2 + 10 / 5
[1] 4
July 2, 2025
In this workshop, we will learn about:
The console (by default at the bottom left in RStudio) is where most of the action happens. In the console, we can use R interactively. We write a command and then execute it by pressing Enter.
In its most basic use, R can be a calculator. Try executing the following commands:
Those symbols are called “binary operators”: we can use them to multiply, divide, add and subtract. Once we execute the command (the “input”), we can see the result in the console (the “output”).
What if we want to keep reusing the same value? We can store data by creating objects, and assigning values to them with the assignment operator <-
:
You can use the shortcut Alt+- to type the assignement operator quicker.
We can also store text data:
You should now see your objects listed in you environment pane (top right).
As you can see, you can store different kinds of data as objects. If you want to store text data (a “string of characters”), you have to use quotes around them.
You can recall your recent commands with the up arrow, which is especially useful to correct typos or slightly modify a long command.
Using the console is great to test things and quickly run commands and get outputs. However, if we want to store our process and refine our code as we go over several sessions, it is best to work with a script. Let’s do a bit more setting up of our project.
To keep it tidy, we are creating 3 folders in our project directory:
You can do that with the “New Folder” button in the “Files” pane (bottom right of the window).
Scripts are simple text files that contain R code. They are useful for:
Let’s create a new R script with the menu: File > New File > R Script. (This can also be done with the first icon in the toolbar, or with the shortcut Ctrl+Shift+N.)
This opens our fourth pane in the top left of RStudio: the source pane.
Now, add some commands to your script:
Notice the colours? This is called syntax highlighting. This is one of the many ways RStudio makes it more comfortable to work with R. The code is more readable when working in a script.
While editing your script, you can run the current command (or the selected block of code) by using Ctrl+Enter. Remember to save your script regularly with the shortcut Ctrl+S. You can find more shortcuts with Alt+Shift+K, or the menu “Tools > Keyboard Shortcuts Help”.
An R function is a little program that does a particular job. It usually looks like this:
<functionname>(<argument(s)>)
Arguments tell the function what to do. Some functions don’t need arguments, others need one or several, but they always need the parentheses after their name.
For example, try running the following command:
The round()
function rounds a number to the closest integer. The only argument we give it is num2
, the number we want to round.
If you scroll back to the top of your console, you will now be able to spot functions in the text.
What if we want to learn more about a function?
There are two main ways to find help about a specific function in RStudio:
?functionname
Let’s look through the documentation for the round()
function:
As you can see, different functions might share the same documentation page.
There is quite a lot of information in a function’s documentation, but the most important bits are:
See how the round()
function has a second argument available? Try this now:
We can change the default behaviour of the function by telling it how many digits we want after the decimal point, using the argument digits
. And if we use the arguments in order, we don’t need to name them:
To group values together in a single object, use the c()
function.
c()
combines the arguments into a vector. In other words, it takes any number of arguments (hence the ...
), and stores all those values together, as one single object. For example, let’s store the ages of our pet dogs in a new object:
You can store missing data as
NA
.
We can now reuse this vector, and calculate their human age:
R can create visualisations with functions too. Try a bar plot of your dogs’ ages with the barplot()
function:
We can customise the plot with a title and some colours, for example:
Use the help pages to find out what these functions do, and try executing commands with them:
rep.int()
mean()
rm()
rep.int()
creates vectors like c()
, but it is designed to easily replicate values. For example, if you find something very funny:
[1] "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!"
[13] "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!"
[25] "Ha!" "Ha!" "Ha!" "Ha!" "Ha!" "Ha!"
The next function, mean()
, returns the mean of a vector of numbers:
What happened there?
We have an NA value in the vector, which means the function can’t tell what the mean is. If we want to change this default behaviour, we can use an extra argument: na.rm
, which stands for “remove NAs”.
In our last command, if we hadn’t named the
na.rm
argument, R would have understoodTRUE
to be the value for thetrim
argument!
rm()
removes an object from your environment (remove()
and rm()
point to the same function). For example:
R does not check if you are sure you want to remove something! As a programming language, it does what you ask it to do, which means you might have to be more careful. But you’ll see later on that, when working with scripts, this is less of a problem.
Let’s do some more complex operations by combining two functions:
ls()
returns a character vector: it contains the names of all the objects in the current environment (i.e. the objects we created in this R session). Notice that this function doesn’t require us to provide any argument, but we still need to write the parentheses to run the function.
Is there a way we could combine ls()
with rm()
?
You can remove all the objects in the environment by using ls()
as the value for the list
argument:
We are nesting a function inside another one. More precisely, we are using the output of the ls()
function as the value passed on to the list
argument in the rm()
function.
If you don’t finish a function, by leaving off the last bracket )
for example, the line of code won’t necessarily give you an error, but it won’t work very well. If you forget to include that last bracket, R will run the code, and then wait for further instructions before giving you an output. This will appear as a +
in the console like so:
If you try to give any further instructions to R, it will likely just continue giving you +
symbols, and not return anything. To stop this, click on the console and press the Esc
key on your keyboard.
We’ve practised how to find help about functions we know the name of. What if we don’t know what the function is called? Or if we want general help about R?
help.start()
is a good starting point: it opens a browser of official R help.??
syntax. For example, try executing ??anova
.Let’s bring in some data. Download our gapminder dataset and save it in the data/
folder you just created.
Once you’ve got the data, use the read.csv()
command to bring it into R:
What do you think they do? Describe each one in detail, and try executing them.
We have downloaded a CSV file from the Internet, and read it into an object called gapminder
.
You can type the name of your new object to print it to screen:
That’s a lot of lines printed to your console. To have a look at the first few lines only, we can use the head()
function:
country year pop continent lifeExp gdpPercap
1 Afghanistan 1952 8425333 Asia 28.801 779.4453
2 Afghanistan 1957 9240934 Asia 30.332 820.8530
3 Afghanistan 1962 10267083 Asia 31.997 853.1007
4 Afghanistan 1967 11537966 Asia 34.020 836.1971
5 Afghanistan 1972 13079460 Asia 36.088 739.9811
6 Afghanistan 1977 14880372 Asia 38.438 786.1134
Now let’s use a few functions to learn more about our dataset:
[1] "data.frame"
[1] 1704
[1] 6
[1] 1704 6
[1] "country" "year" "pop" "continent" "lifeExp" "gdpPercap"
All the information we just saw (and more) is available with one single function:
'data.frame': 1704 obs. of 6 variables:
$ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
$ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
$ pop : num 8425333 9240934 10267083 11537966 13079460 ...
$ continent: chr "Asia" "Asia" "Asia" "Asia" ...
$ lifeExp : num 28.8 30.3 32 34 36.1 ...
$ gdpPercap: num 779 821 853 836 740 ...
The RStudio’s environment panel already shows us some of that information (click on the blue arrow next to the object name).
And to explore the data in a viewer, click on the table icon next to the object in the Environment pane.
This viewer allows you to explore your data by scrolling through, searching terms, filtering rows and sorting the data. Remember that it is only a viewer: it will never modify your original object.
Notice that RStudio actually runs the
View()
function. Feel free to use that instead of clicking on the button, but note that the case matters: using a lowercase “v” will yield an error.
To see summary statistics for each of our variables, you can use the summary()
function:
country year pop continent
Length:1704 Min. :1952 Min. :6.001e+04 Length:1704
Class :character 1st Qu.:1966 1st Qu.:2.794e+06 Class :character
Mode :character Median :1980 Median :7.024e+06 Mode :character
Mean :1980 Mean :2.960e+07
3rd Qu.:1993 3rd Qu.:1.959e+07
Max. :2007 Max. :1.319e+09
lifeExp gdpPercap
Min. :23.60 Min. : 241.2
1st Qu.:48.20 1st Qu.: 1202.1
Median :60.71 Median : 3531.8
Mean :59.47 Mean : 7215.3
3rd Qu.:70.85 3rd Qu.: 9325.5
Max. :82.60 Max. :113523.1
Notice how categorical and numerical variables are handled differently?
Let’s now plot the relationship between GDP per capita and life expectancy:
plot(gapminder$gdpPercap, gapminder$lifeExp,
xlab = "GDP per capita (USD)",
ylab = "Life expectancy (years)")
For more on visualisations, we will later dive into the popular ggplot2 package.
Finally, let’s fit a linear model to see how strongly correlated the two variables are:
Call:
lm(formula = gapminder$lifeExp ~ gapminder$gdpPercap)
Residuals:
Min 1Q Median 3Q Max
-82.754 -7.758 2.176 8.225 18.426
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.396e+01 3.150e-01 171.29 <2e-16 ***
gapminder$gdpPercap 7.649e-04 2.579e-05 29.66 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.49 on 1702 degrees of freedom
Multiple R-squared: 0.3407, Adjusted R-squared: 0.3403
F-statistic: 879.6 on 1 and 1702 DF, p-value: < 2.2e-16
The P-value suggests that there is a strong relationship between the two.
Packages add functionalities to R and RStudio. There are more than 21000 available.
You can see the list of installed packages in your “Packages” tab, or by using the library()
function without any argument.
We are going to install a package called “skimr”. We can do that in the Packages tab:
Notice how it runs an
install.packages()
command in the console? You can use that too.
If I now try running the command skim()
, I get an error. That’s because, even though the package is installed, I need to load it every time I start a new R session. The library()
function does that. Let’s load the package, and use the skim()
function to get an augmented summary of our gapminder
dataset:
library(skimr) # load the package into your library
skim(gapminder) # use a function from the package
Name | gapminder |
Number of rows | 1704 |
Number of columns | 6 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
country | 0 | 1 | 4 | 24 | 0 | 142 | 0 |
continent | 0 | 1 | 4 | 8 | 0 | 5 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
year | 0 | 1 | 1979.50 | 17.27 | 1952.00 | 1965.75 | 1979.50 | 1993.25 | 2007.0 | ▇▅▅▅▇ |
pop | 0 | 1 | 29601212.33 | 106157896.75 | 60011.00 | 2793664.00 | 7023595.50 | 19585221.75 | 1318683096.0 | ▇▁▁▁▁ |
lifeExp | 0 | 1 | 59.47 | 12.92 | 23.60 | 48.20 | 60.71 | 70.85 | 82.6 | ▁▆▇▇▇ |
gdpPercap | 0 | 1 | 7215.33 | 9857.45 | 241.17 | 1202.06 | 3531.85 | 9325.46 | 113523.1 | ▇▁▁▁▁ |
This function provides further summary statistics, and even displays a small histogram for each numeric variable.
Packages are essential to use R to its full potential, by making the most out of what other users have created and shared with the community. To get an idea of some of the most important packages depending on your field of study, you can start with the CRAN Task Tiews.
For a bit of fun:
say()
.paste()
function and its arguments.)You can close RStudio after making sure that you saved your script.
When you create a project in RStudio, you create an .Rproj
file that gathers information about the state of your project. When you close RStudio, you have the option to save your workspace (i.e. the objects in your environment) as an .Rdata
file. The .Rdata
file is used to reload your workspace when you open your project again. Projects also bring back whatever source file (e.g. script) you had open, and your command history. You will find your command history in the “History” tab (upper right panel): all the commands that we used should be in there.
If you have a script that contains all your work, it is a good idea not to save your workspace: it makes it less likely to run into errors because of accumulating objects. The script will allow you to get back to where you left it, by executing all the clearly laid-out steps.
The console, on the other hand, only shows a brand new R session when you reopen RStudio. Sessions are not persistent, and a clean one is started when you open your project again, which is why you have to load any extra package your work requires again with the library()
function.
Comments
We should start with a couple of comments, to document our script. Comments start with
#
, and will be ignored by R: